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ABSTRACT 

A central issue in programming practice involves 
determining the appropriate size and information 
content of a software module. This study at- 
tempted to determine the effectiveness of two 
widely used criteria for software modulariza- 
tion, strength and size, in reducing fault rate 
and development cost. Data from 453 FORTRAN 
modules developed by professional programmers 
were analyzed. The results indicated that mod- 
ule strength is a good criterion with respect to 
fault rate, whereas arbitrary module size limi- 
tations inhibit programmer productivity. This 
analysis is a first step toward defining empiri- 
cally based standards for software modulariza- 
tion. 


INTRODUCTION 

The module is the basic unit of software devel- 
opment, maintenance, and management. A basic 
activity of the software design process is the 
partitioning of the software specification into 
a number of program modules that together sat- 
isfy the original problem statement. To do 
this, programmers need criteria for defining the 
information content and organization of modules. 

The major theoretical criteria for software mod- 
ularization include strength/cohesion and cou- 
pling 1 and information hiding. 2 These criteria 
are, however, difficult to quantify. An inde- 
pendent observer of the development process can- 
not easily determine the levels of strength, 
coupling, and information hiding achieved in any 
given module. The use of these concepts is thus 
limited in an environment where quality assur- 
ance (as adherence to standards) is stressed. 

Measures of size (number of source lines of code 
or executable statements) have consequently been 
adopted as a simple expedient. 3 Although many 
benefits have been claimed for module size limi- 
tations, at present there is no theoretical 
basis or empirical evidence for using module 
size as a criterion for software modulariza- 
tion. * 


The purpose of this study was to compare the 
effectiveness of size (e.g., a 60- line -module 
standard) and a theoretically based measure 
(strength) as criteria for software modulariza- 
tion. Strength (or singleness of purpose) was 
chosen for this comparison because, like size, 
it can be determined from the contents of a 
single module. Measuring coupling or informa- 
tion hiding requires that more than one module 
at a time be examined. 

This study, therefore, compares the effective- 
ness of module strength and size criteria with 
respect to module cost and fault rate. Although 
maintainability (or modifiability) is another 
important software attribute, it was not pos- 
sible to measure or analyze it in this study. 
Because some programmers generally produce low- 
fault, low-cost modules while others produce 
expensive, faultprone modules, it was also nec- 
essary to investigate the interaction of these 
criteria with individual programmer performance. 

DATA ANALYZED 

This study examines data from 453 new FORTRAN 
modules developed by 26 professional programmers 
for 5 major software development projects. The 
term "module” has been defined in many different 
ways. For the purposes of this study, it refers 
to a FORTRAN subroutine, or the smallest program 
unit that is independently compilable. Although 
more sophisticated languages are available, many 
organizations rely on FORTRAN for scientific 
computing applications. This study is thus 
relevant to current practice. Furthermore, 
these modularization criteria seem likely to 
remain important considerations in software 
development using new languages such as Adat 
(for which extensive data are not yet available) • 

The Software Engineering Laboratory 5 (SEL) col- 
lected these data as part of an ongoing program 
of software measurement and technology evalua- 
tion. The SEL is a research project sponsored 
by the National Aeronautics and Space Adminis- 
tration/Goddard Space Flight Center (NASA/ 

GSFC) and supported by Computer Sciences Corpo- 
ration and the University of Maryland. The SEL 


tAda is a registered trademark of the 
U.S. Government, Ada Joint Program Office. 
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studies software developed for spacecraft flight 
dynamics applications. These systems provide 
ground-based support for spacecraft navigation 
and control. Typical projects produce from 
30*000 to 150*000 source lines of code. 

Module Strength 

Myers® defines seven levels of module strength. 

In descending order* these are functional* in- 
formational* communicational* procedural* 
classical* logical* and coincidental. A high 
(functional) -strength module performs a single 
well-defined function. Myers contends that 
high-strength modules are superior to low- 
strength modules. Although it was not possible 
to test this theory exactly* a reasonable ap- 
proximation was made. Although some recent 
attempts to develop objective measures of module 
strength 7 ' 8 seem promising* they are not (in 
their present forms) easily applied. Conse- 
quently* they were not employed in this study. 

Instead* programmers determined the strength of 
a module using a checklist. Programmers rated 
each module they developed as performing one or 
more of the following functions! input/output* 
logic/control* and algorithmic processing. Dis- 
tinguishing the types of functions seemed to be 
a less ambiguous task than identifying the number 
of functions* because the number of functions 
depends on the level of decomposition rec- 
ognized by the respondent. Performing a single 
function type is a necessary (but not suffi- 
cient) condition for high module strength. 

Those modules described as having only one func- 
tion were classified as high strength; those 
described as having two functions were classi- 
fied as medium strength; and those modules de- 
scribed as having three or more functions rated 
low strength* Table 1 summarizes the results of 
this classification process. 


Table 1. Module Strength Distribution 


MODULE 

STRENGTH 

NUMBER OF 
FORTRAN 
MODULES 

MEAN 

EXECUTABLE 

STATEMENTS 

MEAN 

DECISIONS PER 
EXECUTABLE 
STATEMENT 

LOW 

90 

77 

0.29 

MEDIUM 

t78 

80 

0.32 

HIGH 

187 

48 

0.32 


Module Size 

The 453 modules in the sample were classified 
into three approximately equal ordered groups on 
the basis of the number of executable statements 
in each module. Table 2 shows the results of 
this classification. 

The largest module in the sample contained 267 
executable statements. The dividing line of 31 
executable statements is significant because* in 
the environment studied* it corresponds to about 
60 source lines of code. Many programming 
standards 3 limit module size to one page (or 
50 to 60 source lines of code) • The informal 


guideline used in this environment is that no 
module should exceed 2 pages (about 64 execut- 
able statements) • Military standards on module 
size range from 50 to 200 executable state- 
ments. 4 One purpose of the study was to test 
the validity of such standards* in general* and* 
in particular* to determine if the local guide- 
line should be strengthened. 


Table 2. Module Size Distribution 


MOOULE 

SIZE 

NUMBER OF 
FORTRAN 
MODULES 

EXECUTABLE 

STATEMENTS 

MEAN 

DECISIONS PER 
EXECUTABLE 
STATEMENT 

SMALL 

154 

1 TO 31 

0.31 

MEDIUM 

148 

32 TO 64 

0.31 

LARGE 

151 

65 OR MORE 

0.32 


ANALYSIS RESULTS 


The objective of the analysis was to determine 
the effect of module size and strength criteria 
on quality measures* that is* the module cost 
(number of hours per executable statement) and 
fault rate (number of faults per executable 
statement) » An initial examination of the data 
revealed that neither module cost nor fault rate 
was normally distributed. Figures 1 and 2 
illustrate these phenomena. Consequently* the 
authors adopted contingency table and nonpara- 
metric correlation approaches to the analysis 
rather than relying on normal-distribution-based 
techniques such as regression and analysis of 
variance. 

To perform the contingency table analysis* every 
module was assigned to one of three ordered 
classes (of nearly equal size) for each of the 
quality measures of cost (low* medium* high) and 
fault rate (zero* medium* high) • The values 
0.151 and 0.322 programmer hour per executable 
statement divided the modules into the three 
cost classes (i.e.* 0.151 or less was low 
cost) . Faults were counted for each module from 
the completion of unit testing until the end of 
acceptance testing. The value 0.045 fault per 
executable statement distinguished between 
medium- and high-fault-rate classes. One class 
consisted of those modules with no faults. It 
was thus possible to form a series of 3-by-3 
tables* each comparing classes of module strength 
or size with classes of module cost or fault 
rate. 

The strength of relationships was assessed by 
calculating the gamma (y) correlation statis- 
tic 9 between the ordered classes of modulari- 
zation criteria and quality measures. This 
statistic varies from -1.0 to +1.0. For example* 
a perfect negative correlation (-1.0) would re- 
sult only if all high-strength modules had zero 
faults* all medium-strength modules had medium 
fault rates* and all low-strength modules had 
high fault rates. Variations in programmer per- 
formance also affect module cost and fault 
rate 10 ; therefore* this factor was also con- 
sidered in the general analysis as well as in a 
subsequent analysis. 
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General Results 

Initially, module strength and size were cross- 
tabulated with cost and fault rate. Lines 1 and 
4 of Table 3 list the correlation coefficients 
obtained from this analysis. Significant rela- 
tionships were found between module strength and 
fault rate (t * -0.35) and between module size 
and cost (y - -0.31). The criterion for sig- 
nificance (probability of error less than 0.001) 
is very conservative. These correlations seem 
low, but Figures 3 and 4 provide better illus- 
trations of the magnitude of these relation- 
ships. Fully 50 percent of high-strength 
modules were fault-free while only 18 percent of 
low-strength modules were fault-free. Simi- 



HOURS PER EXECUTABLE STATEMENT 


Figure 1. Distribution of Cost 



Figure 2. Distribution of Faults 


larly, 46 percent of large modules fell into the 
lowest cost class, whereas just 22 percent of 
the small modules were rated as low cost. 


Table 3. Contingency Table Results 


CRITERIA 

EFFECT 

CONTROLLED 

correlations 3 

LINE 

fault rate 

COST RATE 

MODULE 

NONE 

-0.35 b 

-0.19 

1 

STRENGTH 

SIZE 

-0.32** 

-0.27** 

2 


PROGRAMMER 

-0.21 

0.10 

3 

MODULE 

NONE 

0.20 

-0.31** 

4 

SIZE 

STRENGTH 

0.19 

-0.38** 

5 


PROGRAMMER 

0.27** 

-0.41** 

6 


a GAMMA <r> STATISTIC. 

^PROBABLY LESS THAN 0.001 THAT CORRELATION IS ACTUALLY ZERO. 


Table 1 indicates, however, that module strength 
and size might be related to each other. Low- 
strength modules tend to be larger* Lines 2 and 
5 of Table 3 show the (partial) correlations 
obtained for module strength and size individ- 
ually while controlling (removing) the effect of 
the other. The relationships with module fault 
rate remain essentially unchanged. There is, 
however, some interaction between module strength 
and size with respect to module cost. (Compare 
line 1 versus line 2 and line 4 versus line 5 in 
Table 3.) 

Controlling for module size, the correlation 
between module strength and cost increases from 
-0.19 to -0.27 and becomes significant. Con- 
trolling for module strength, the correlation 
between module size and cost increases from 
-0.31 to -0.38. These results imply that, over- 
all, high-strength modules (usually small) tend 
to be low cost but that large modules also tend 
to be low cost (independent of module strength). 
Another study 11 identified a similar relation- 
ship between module size and cost for a very 
different type of software. 

One previous study 12 that found a lower fault 
rate for larger modules based its conclusions on 
the behavior exhibited by a small sample of 
large modules. Another study 10 applied param- 
etric regression to a larger sample from the 
same data base as this study. As discussed 
earlier, that statistical approach is inappro- 
priate for non-no rmally distributed data. 

Although these results contradict the two 
previous studies of fault rate, the current 
results appear to be more robust. 

Thus far, the potential effects of programmer 
performance were ignored. Lines 3 and 6 of 
Table 3 show the correlations between the mod- 
ularization criteria and quality measures ob- 
tained while controlling for the effect of 
programmer performance. (The interaction of 
module size and strength is, however, no longer 
controlled.) The large changes from the initial 
correlations demonstrate that programmer per- 
formance interacts with both module size and 
strength. The disappearance of the significance 
of the relationships between module strength and 
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Figure 3* Fault Rate for Classes of Module Strength 
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Figure 4, Development Cost for Classes of Module Size 


module cost and fault rate indicates that these 
relationships exist because high-strength mod- 
ules are associated with programmers who produce 
modules that cost less and have low module fault 
rates. 

Prog rammer -Specific Results 

The effect of programmer performance was also 
examined in a subsequent analysis. Of the 
26 programmers in the sample, 16 developed 9 or 
more modules. Together these programmers ac- 
counted for 413 of the total 453 modules. The 
performance of these programmers was reanalyzed 
using nonpar arae trie correlation 9 to better 
define the relationship of programmer perform- 
ance to modularization criteria. Table 4 
summarizes the data obtained from the 
16 programmers. 


Table 4. Programmer Data Summary 


PROGRAMMER 

NUMBER OF 
FORTRAN 
MODULES 

MEAN 

EXECUTABLE 

STATEMENTS 

MEAN 

DECISIONS PER 
EXECUTABLE 
STATEMENT 

A 

46 

46 

O.X 

B 

25 

45 

0.35 

C 

25 

57 

0.40 

D 

28 

X 

0.33 

E 

9 

53 

0.23 

F 

54 

51 

0.35 

G 

24 

62 

0.32 

H 

X 

71 

0.31 

1 

18 

77 

0.29 

J 

50 

56 

0.26 

K 

17 

47 

0.41 

L 

40 

48 

0.31 

M 

13 

64 

0.39 

N 

9 

X 

0.33 

0 

16 

38 

O.X 

P 

9 

53 

0.34 


For each of these programmers, the percent of 
zero-fault and low-cost modules was computed. 
Table 5 shows the correlations (by programmer) 
between the modularization criteria and the 
quality measures. Programmers who produce low- 
fault-rate modules (i.e., "good** programmers) 
tend to produce high-strength modules. Good 
programmers do not, however, appear to have any 
preference for a particular module size. The 
lower significance levels associated with the 
correlation coefficients result from the reduc- 
tion in sample size produced by studying 16 pro- 
grammers instead of 453 modules. 


Table 5. Nonparametric Correlation Results 
(by Programmer) 


CRITERIA 

CORRELATIONS* 

FAULT RATE 

COST RATE 

MODULE STRENGTH 
MOOULE SIZE 

-0.53 b 

-0.17 

-0.29 

-0.18 


a SPEARMAN CORRELATION COEFFICIENT. 

^PROBABILITY LESS THAN 0.06 THAT CORRELATION IS ACTUAL* 
LY ZERO. 
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Figure 5 illustrates the relationship between 
nodule strength and the fault rate. Although 
the trend is clear , a great deal of unexplained 
variation is also present. Good programming 
consists of more than just writing high-strength 
modules . 


• Overall, large modules cost less (per 
executable statement) than small mod- 
ules. 

• Fault rate is not directly related to 
module size. 


100 p 
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50 - 
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10 1 - 
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10 20 30 40 50 60 70 30 90 100 

PERCENT HIGH STRENGTH 

Figure 5. Module Strength and Faults by 
Programmer 


CONCLUSIONS 

The preceding discussion examined the relation* 
ship between modularization criteria and quality 
measures from two perspectives: their overall 

effect and the contribution of individual pro- 
grammer performance. Conclusions based on the 
contingency table analysis (lines 2 and 5 of 
Table 3) are correct as stated. Finding that 
programmer performance accounts for some of the 
strength of these relationships does not affect 
their validity. However, this result does high- 
light the difficulty of separating the effects 
of programmer performance from those of tech- 
nology or methodology. 13 Furthermore, it 
enables us to learn about software development 
in the way that Soloway 14 prescribes, by ob- 
serving what good programmers do. Conclusions 
based on the preceding analysis are as follows: 

• Good programmers tend to write high- 
strength modules. 

• Good programmers show no preference for 
any specific module size. 


These conclusions suggest that module size 
should not be arbitrarily limited by any pro- 
gramming standard. Two-thirds of the modules in 
this sample fell below the local size guideline 
of two pages (about 64 executable statements) , 
even though this is not an enforced standard. 

As noted by Bowen 4 , the application of a good 
design methodology usually results in modules 
well below the common size limits. 

Generally, programmers should be encouraged to 
write high-strength modules but to make those 
modules large enough to encompass an entire 
function. Because low-strength modules are 
likely to be larger than average, a module size 
criteria may have an indirect favorable effect 
on the fault rate. However, the cost advantages 
associated with larger modules dictate that 
large, high-strength modules must also be ac- 
ceptable. Large modules may be appropriate for 
some types of software (for example, mathe- 
matical algorithms) * 

Programmers, especially the less experienced 
ones, should be encouraged to write high- 
strength modules because this is a character- 
istic of successful programmers. The further 
development of objective measures of module 
strength may make this criterion more palatable 
to organizations that use formal quality assur- 
ance procedures. A better measure of module 
strength should show an even higher correlation 
with fault rate. In the interim, a simple 
checklist of the number of types of functions 
performed can provide a simple but effective 
assessment of strength for quality assurance 
purposes. 
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